深度学习的繁荣有助于场景文本检测的快速进步。在所有具有卷积网络的方法中,基于细分的方法在检测任意形状和极端纵横比的文本实例方面的优越性,引起了广泛的关注。但是,自下而上的方法仅限于其分割模型的性能。在本文中,我们提出了DPTNET(双路线变压器网络),这是一种简单而有效的体系结构,可为场景文本检测任务建模全局和本地信息。我们进一步提出了一种平行的设计,将卷积网络与强大的自我发场机制相结合,以在注意力路径和卷积路径之间提供互补的线索。此外,开发了两个路径上的双向相互作用模块,以提供通道和空间尺寸的互补线索。我们还通过向其添加额外的多头注意力层来升级集中操作。我们的DPTNET在MSRA-TD500数据集上实现了最先进的结果,并就检测准确性和速度提供了其他标准基准的竞争结果。
translated by 谷歌翻译
车辆到所有(V2X)网络已使自主驾驶中的合作感达到了协作感,这是对独立情报的根本缺陷的有前途的解决方案,包括盲区和远距离感知。但是,缺乏数据集严重阻碍了协作感知算法的发展。在这项工作中,我们发布了海豚:用于协作感知的数据集,可以使和谐且相互联系的自动驾驶,这是一个新的模拟大规模的各种大规模的各种赛车多模式多模式自动驾驶数据集,该数据集为互连为互连的开创性基准平台提供自动驾驶。海豚在六个维度上优于当前数据集:从车辆和道路侧单元(RSU)(RSUS)的临时图像和点云,启用车辆到车辆(V2V)和车辆到基础设施(V2I)的协作感知; 6具有动态天气条件的典型场景使各种互连的自动驾驶数据集最多;精心选择的观点,提供关键区域和每个对象的全部覆盖范围; 42376帧和292549个对象,以及相应的3D注释,地理位置和校准,构成了最大的协作知觉数据集;全高清图像和64线激光雷达构建高分辨率数据,并具有足够的详细信息;组织良好的API和开源代码可确保海豚的可扩展性。我们还构建了2D检测,3D检测和关于海豚的多视图协作任务的基准。实验结果表明,通过V2X通信的原始融合方案可以帮助提高精度,并在RSU存在时减少昂贵的LiDAR设备的必要性,这可能会加速相互联系的自动驾驶车辆的普及。现在可以在https://dolphins-dataset.net/上获得海豚。
translated by 谷歌翻译
本文回顾了关于压缩视频质量增强质量的第一个NTIRE挑战,重点是拟议的方法和结果。在此挑战中,采用了新的大型不同视频(LDV)数据集。挑战有三个曲目。Track 1和2的目标是增强HEVC在固定QP上压缩的视频,而Track 3旨在增强X265压缩的视频,以固定的位速率压缩。此外,轨道1和3的质量提高了提高保真度(PSNR)的目标,以及提高感知质量的2个目标。这三个曲目完全吸引了482个注册。在测试阶段,分别提交了12个团队,8支球队和11支球队,分别提交了轨道1、2和3的最终结果。拟议的方法和解决方案衡量视频质量增强的最先进。挑战的首页:https://github.com/renyang-home/ntire21_venh
translated by 谷歌翻译
Blind image quality assessment (BIQA) remains challenging due to the diversity of distortion and image content variation, which complicate the distortion patterns crossing different scales and aggravate the difficulty of the regression problem for BIQA. However, existing BIQA methods often fail to consider multi-scale distortion patterns and image content, and little research has been done on learning strategies to make the regression model produce better performance. In this paper, we propose a simple yet effective Progressive Multi-Task Image Quality Assessment (PMT-IQA) model, which contains a multi-scale feature extraction module (MS) and a progressive multi-task learning module (PMT), to help the model learn complex distortion patterns and better optimize the regression issue to align with the law of human learning process from easy to hard. To verify the effectiveness of the proposed PMT-IQA model, we conduct experiments on four widely used public datasets, and the experimental results indicate that the performance of PMT-IQA is superior to the comparison approaches, and both MS and PMT modules improve the model's performance.
translated by 谷歌翻译
It has been observed in practice that applying pruning-at-initialization methods to neural networks and training the sparsified networks can not only retain the testing performance of the original dense models, but also sometimes even slightly boost the generalization performance. Theoretical understanding for such experimental observations are yet to be developed. This work makes the first attempt to study how different pruning fractions affect the model's gradient descent dynamics and generalization. Specifically, this work considers a classification task for overparameterized two-layer neural networks, where the network is randomly pruned according to different rates at the initialization. It is shown that as long as the pruning fraction is below a certain threshold, gradient descent can drive the training loss toward zero and the network exhibits good generalization performance. More surprisingly, the generalization bound gets better as the pruning fraction gets larger. To complement this positive result, this work further shows a negative result: there exists a large pruning fraction such that while gradient descent is still able to drive the training loss toward zero (by memorizing noise), the generalization performance is no better than random guessing. This further suggests that pruning can change the feature learning process, which leads to the performance drop of the pruned neural network. Up to our knowledge, this is the \textbf{first} generalization result for pruned neural networks, suggesting that pruning can improve the neural network's generalization.
translated by 谷歌翻译
Time-series anomaly detection is an important task and has been widely applied in the industry. Since manual data annotation is expensive and inefficient, most applications adopt unsupervised anomaly detection methods, but the results are usually sub-optimal and unsatisfactory to end customers. Weak supervision is a promising paradigm for obtaining considerable labels in a low-cost way, which enables the customers to label data by writing heuristic rules rather than annotating each instance individually. However, in the time-series domain, it is hard for people to write reasonable labeling functions as the time-series data is numerically continuous and difficult to be understood. In this paper, we propose a Label-Efficient Interactive Time-Series Anomaly Detection (LEIAD) system, which enables a user to improve the results of unsupervised anomaly detection by performing only a small amount of interactions with the system. To achieve this goal, the system integrates weak supervision and active learning collaboratively while generating labeling functions automatically using only a few labeled data. All of these techniques are complementary and can promote each other in a reinforced manner. We conduct experiments on three time-series anomaly detection datasets, demonstrating that the proposed system is superior to existing solutions in both weak supervision and active learning areas. Also, the system has been tested in a real scenario in industry to show its practicality.
translated by 谷歌翻译
As an important variant of entity alignment (EA), multi-modal entity alignment (MMEA) aims to discover identical entities across different knowledge graphs (KGs) with multiple modalities like images. However, current MMEA algorithms all adopt KG-level modality fusion strategies but ignore modality differences among individual entities, hurting the robustness to potential noise involved in modalities (e.g., unidentifiable images and relations). In this paper we present MEAformer, a multi-modal entity alignment transformer approach for meta modality hybrid, to dynamically predict the mutual correlation coefficients among modalities for instance-level feature fusion. A modal-aware hard entity replay strategy is also proposed for addressing vague entity details. Extensive experimental results show that our model not only achieves SOTA performance on multiple training scenarios including supervised, unsupervised, iterative, and low resource, but also has limited parameters, optimistic speed, and good interpretability. Our code will be available soon.
translated by 谷歌翻译
The task of video prediction and generation is known to be notoriously difficult, with the research in this area largely limited to short-term predictions. Though plagued with noise and stochasticity, videos consist of features that are organised in a spatiotemporal hierarchy, different features possessing different temporal dynamics. In this paper, we introduce Dynamic Latent Hierarchy (DLH) -- a deep hierarchical latent model that represents videos as a hierarchy of latent states that evolve over separate and fluid timescales. Each latent state is a mixture distribution with two components, representing the immediate past and the predicted future, causing the model to learn transitions only between sufficiently dissimilar states, while clustering temporally persistent states closer together. Using this unique property, DLH naturally discovers the spatiotemporal structure of a dataset and learns disentangled representations across its hierarchy. We hypothesise that this simplifies the task of modeling temporal dynamics of a video, improves the learning of long-term dependencies, and reduces error accumulation. As evidence, we demonstrate that DLH outperforms state-of-the-art benchmarks in video prediction, is able to better represent stochasticity, as well as to dynamically adjust its hierarchical and temporal structure. Our paper shows, among other things, how progress in representation learning can translate into progress in prediction tasks.
translated by 谷歌翻译
Implicit regularization is an important way to interpret neural networks. Recent theory starts to explain implicit regularization with the model of deep matrix factorization (DMF) and analyze the trajectory of discrete gradient dynamics in the optimization process. These discrete gradient dynamics are relatively small but not infinitesimal, thus fitting well with the practical implementation of neural networks. Currently, discrete gradient dynamics analysis has been successfully applied to shallow networks but encounters the difficulty of complex computation for deep networks. In this work, we introduce another discrete gradient dynamics approach to explain implicit regularization, i.e. landscape analysis. It mainly focuses on gradient regions, such as saddle points and local minima. We theoretically establish the connection between saddle point escaping (SPE) stages and the matrix rank in DMF. We prove that, for a rank-R matrix reconstruction, DMF will converge to a second-order critical point after R stages of SPE. This conclusion is further experimentally verified on a low-rank matrix reconstruction problem. This work provides a new theory to analyze implicit regularization in deep learning.
translated by 谷歌翻译
Gradient-based explanation is the cornerstone of explainable deep networks, but it has been shown to be vulnerable to adversarial attacks. However, existing works measure the explanation robustness based on $\ell_p$-norm, which can be counter-intuitive to humans, who only pay attention to the top few salient features. We propose explanation ranking thickness as a more suitable explanation robustness metric. We then present a new practical adversarial attacking goal for manipulating explanation rankings. To mitigate the ranking-based attacks while maintaining computational feasibility, we derive surrogate bounds of the thickness that involve expensive sampling and integration. We use a multi-objective approach to analyze the convergence of a gradient-based attack to confirm that the explanation robustness can be measured by the thickness metric. We conduct experiments on various network architectures and diverse datasets to prove the superiority of the proposed methods, while the widely accepted Hessian-based curvature smoothing approaches are not as robust as our method.
translated by 谷歌翻译